cellwise outlier
Cellwise and Casewise Robust Covariance in High Dimensions
Centofanti, Fabio, Hubert, Mia, Rousseeuw, Peter J.
The sample covariance matrix is a cornerstone of multivariate statistics, but it is highly sensitive to outliers. These can be casewise outliers, such as cases belonging to a different population, or cellwise outliers, which are deviating cells (entries) of the data matrix. Recently some robust covariance estimators have been developed that can handle both types of outliers, but their computation is only feasible up to at most 20 dimensions. To remedy this we propose the cellRCov method, a robust covariance estimator that simultaneously handles casewise outliers, cellwise outliers, and missing data. It relies on a decomposition of the covariance on principal and orthogonal subspaces, leveraging recent work on robust PCA. It also employs a ridge-type regularization to stabilize the estimated covariance matrix. We establish some theoretical properties of cellRCov, including its casewise and cellwise influence functions as well as consistency and asymptotic normality. A simulation study demonstrates the superior performance of cellRCov in contaminated and missing data scenarios. Furthermore, its practical utility is illustrated in a real-world application to anomaly detection. We also construct and illustrate the cellRCCA method for robust and regularized canonical correlation analysis.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
- Asia > India > NCT > New Delhi (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Data Science > Data Mining > Anomaly Detection (0.68)
Conformal Prediction with Cellwise Outliers: A Detect-then-Impute Approach
Peng, Qian, Bao, Yajie, Ren, Haojie, Wang, Zhaojun, Zou, Changliang
Conformal prediction is a powerful tool for constructing prediction intervals for black-box models, providing a finite sample coverage guarantee for exchangeable data. However, this exchangeability is compromised when some entries of the test feature are contaminated, such as in the case of cellwise outliers. To address this issue, this paper introduces a novel framework called detect-then-impute conformal prediction. This framework first employs an outlier detection procedure on the test feature and then utilizes an imputation method to fill in those cells identified as outliers. To quantify the uncertainty in the processed test feature, we adaptively apply the detection and imputation procedures to the calibration set, thereby constructing exchangeable features for the conformal prediction interval of the test label. We develop two practical algorithms, PDI-CP and JDI-CP, and provide a distribution-free coverage analysis under some commonly used detection and imputation procedures. Notably, JDI-CP achieves a finite sample $1-2α$ coverage guarantee. Numerical experiments on both synthetic and real datasets demonstrate that our proposed algorithms exhibit robust coverage properties and comparable efficiency to the oracle baseline.
- Asia > China > Shanghai > Shanghai (0.04)
- South America > Brazil (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
- (4 more...)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
- Health & Medicine > Therapeutic Area > Oncology (0.46)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Robust Multilinear Principal Component Analysis
Hirari, Mehdi, Centofanti, Fabio, Hubert, Mia, Van Aelst, Stefan
Multilinear Principal Component Analysis (MPCA) is an important tool for analyzing tensor data. It performs dimension reduction similar to PCA for multivariate data. However, standard MPCA is sensitive to outliers. It is highly influenced by observations deviating from the bulk of the data, called casewise outliers, as well as by individual outlying cells in the tensors, so-called cellwise outliers. This latter type of outlier is highly likely to occur in tensor data, as tensors typically consist of many cells. This paper introduces a novel robust MPCA method that can handle both types of outliers simultaneously, and can cope with missing values as well. This method uses a single loss function to reduce the influence of both casewise and cellwise outliers. The solution that minimizes this loss function is computed using an iteratively reweighted least squares algorithm with a robust initialization. Graphical diagnostic tools are also proposed to identify the different types of outliers that have been found by the new robust MPCA method. The performance of the method and associated graphical displays is assessed through simulations and illustrated on two real datasets.
- Africa > Senegal > Kolda Region > Kolda (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- Europe > United Kingdom > England (0.04)
- (3 more...)
The Cellwise Minimum Covariance Determinant Estimator
Raymaekers, Jakob, Rousseeuw, Peter J.
The usual Minimum Covariance Determinant (MCD) estimator of a covariance matrix is robust against casewise outliers. These are cases (that is, rows of the data matrix) that behave differently from the majority of cases, raising suspicion that they might belong to a different population. On the other hand, cellwise outliers are individual cells in the data matrix. When a row contains one or more outlying cells, the other cells in the same row still contain useful information that we wish to preserve. We propose a cellwise robust version of the MCD method, called cellMCD. Its main building blocks are observed likelihood and a penalty term on the number of flagged cellwise outliers. It possesses good breakdown properties. We construct a fast algorithm for cellMCD based on concentration steps (C-steps) that always lower the objective. The method performs well in simulations with cellwise outliers, and has high finite-sample efficiency on clean data. It is illustrated on real data with visualizations of the results.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- (3 more...)
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks > Manufacturer (1.00)